Compiling Deep Learning Models for Custom Hardware Accelerators
نویسندگان
چکیده
Convolutional neural networks (CNNs) are the core of most state-of-the-art deep learning algorithms specialized for object detection and classification. CNNs are both computationally complex and embarrassingly parallel. Two properties that leave room for potential software and hardware optimizations for embedded systems. Given a programmable hardware accelerator with a CNN oriented custom instructions set, the compiler’s task is to exploit the hardware’s full potential, while abiding with the hardware constraints and maintaining generality to run different CNN models with varying workload properties. Snowflake is an efficient and scalable hardware accelerator implemented on programmable logic devices. It implements a control pipeline for a custom instruction set. The goal of this paper is to present Snowflake’s compiler that generates machine level instructions from Torch7 model description files. The main software design points explored in this work are: model structure parsing, CNN workload breakdown, loop rearrangement for memory bandwidth optimizations and memory access balancing. The performance achieved by compiler generated instructions matches against hand optimized code for convolution layers. Generated instructions also efficiently execute AlexNet and ResNet18 inference on Snowflake. Snowflake with 256 processing units was synthesized on Xilinx’s Zynq XC7Z045 FPGA. At 250 MHz, AlexNet achieved in 93.6 frames/s and 1.2 GB/s of off-chip memory bandwidth, and 21.4 frames/s and 2.2 GB/s for ResNet18. Total on-chip power is 5 W.
منابع مشابه
Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks
Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks Convolutional neural networks (CNN) have achieved major breakthroughs in recent years. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex non-linear features; however this ability comes at the cost of high computational and memo...
متن کاملSnowflake: A Model Agnostic Accelerator for Deep Convolutional Neural Networks
Deep convolutional neural networks (CNNs) are the deep learning model of choice for performing object detection, classification, semantic segmentation and natural language processing tasks. CNNs require billions of operations to process a frame. This computational complexity, combined with the inherent parallelism of the convolution operation make CNNs an excellent target for custom accelerator...
متن کاملSynergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC
Convolutional Neural Networks (CNN) have been widely deployed in diverse application domains. There has been significant progress in accelerating both their training and inference using high-performance GPUs, FPGAs, and custom ASICs for datacenter-scale environments. The recent proliferation of mobile and IoT devices have necessitated real-time, energy-efficient deep neural network inference on...
متن کاملDesign Space Exploration of FPGA-Based Deep Convolutional Neural Networks
Deep Convolutional Neural Networks (DCNN) have proven to be very effective in many pattern recognition applications, such as image classification and speech recognition. Due to their computational complexity, DCNNs demand implementations that utilize custom hardware accelerators to meet performance and energy-efficiency constraints. Leverages all sources of parallelism in DCNNs, in this paper w...
متن کاملA Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators
The shift toward parallel computing has resulted into a growing interest in computing systems with heterogeneous processing modules. Reconfigurable devices are often employed in such heterogeneous systems due to their low power and parallel processing benefits. An important issue in the programmability of these systems is the need for a single programming interface. Recent works have leveraged ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1708.00117 شماره
صفحات -
تاریخ انتشار 2017